perm filename IIA3.PUB[NSF,MUS] blob
sn#096502 filedate 1974-04-10 generic text, type C, neo UTF8
COMMENT ā VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .SELECT A
C00010 00003 .GROUP SKIP 2
C00016 00004 .SELECT 5
C00023 00005 .NEXT PAGE
C00033 00006 .NEXT PAGE
C00040 00007 .SELECT 5
C00045 ENDMK
Cā;
.SELECT A
3. TOWARDS A GENERAL MODEL FOR SIMULATION
.SELECT C
.GROUP SKIP 1
CURRENT RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
We will here discuss an important aspect of our current research
which is directed toward our ultimate aim: the development of a
general model for the computer simulation of natural tones. The two
approaches to simulation described above, one using additive
synthesis based upon analysis and the other using frequency
modulation synthesis, have not proceeded independently of one
another, but interactions have occurred at several levels. An
example is provided by the initial use of frequency modulation for
the simulation of a brass tone which was strongly influenced by the
research of Risset (1966) on brass tones, using additive synthesis
based on analysis.
The eventual development of a general model for simulation will be an
outgrowth of the interdependency and convergence of these two
approaches. Examples are given where findings using one technique
are applied and tested with the other, providing a cross-verification
of research discoveries. Moreover, the particular advantages of each
technique influences the direction of research using the other
technique, and, in this way, both approaches are in the process of
converging on a single, more powerful and general model. This
resultant model for simulation will have the advantages of both
methods: the simplicity and perceptual meaningfulness of user-control
over the FM technique, and the wide range of complex cases of natural
tones handled by the additive technique.
.GROUP SKIP 2
%5interactions between additive and FM syntheses%1
One example of the interactions which have occurred between the two
approaches discussed above, one using additive synthesis based on the
analysis of real tones and the other using frequency modulation
synthesis, is the initial discovery of the potency of the FM
technique to synthesize periodic music instrument tones. The
translation of the distinctive cues for brass instruments found by
Risset (1966) - which he derived from analysis-based and data-reduced
additive synthesis techniques largely analogous to those which we are
using - into the parameters for FM synthesis resulted in amazingly
successful simulations of brass tones. This provided a confirmation
of the nature of the perceptual features seen for the brass family of
instruments. It also indicated the power of the FM technique to
simulate important perceptual attributes of tone.
Direct interactions between techniques have ensued in our research.
Salient features for several instruments discovered in our approach
using additive synthesis have been translated into the FM technique,
providing a confirmation of the importance of the suggested cues.
Modulations which occur in certain brass instruments, especially the
French horn, have been found to be critical using both methods of
synthesis. The successful simulation of the violin tone using
additive synthesis was found to necessitate the preservation of the
inharmonicity among the partials in the attack. The application of
this finding to FM synthesis was noted above, and puts a new level of
timbral complexity within the reach of the latter method. The
success found in using any sort of inharmonicity for the simulation
of the quality of the violin attack gives us insight into the
critical feature of that attack and the range of alternative
techniques which can be used to generate it - all of which do not
duplicate the actual acoustical waveform of the real tone! Many
other applications of findings from the additive approach to FM are
in progress, which also include the reed family of instruments.
We find special significance in the convergence of the two
approaches, as it relates to the development of a more general model
for simulation. The simplicity and perceptual meaningfulness of
specifications to the frequency modulation technique points out an
important goal for the additive synthesis method. On the other hand,
the complexities of tone which are revealed by analysis, and which
are confirmed to be perceptually salient in the additive synthesis,
point out necessary levels of complexity which must be accomodated by
the frequency modulation technique. As the latter technique is then
made more complex, it in fact enters the category of additive
synthesis. The ultimate model for simulation will draw from the
research findings using both methods.
.END
.GROUP SKIP 2
.SELECT C
PROPOSED RESEARCH
.GROUP SKIP 1
.SELECT 1
.BEGIN FILL ADJUST
We will here discuss the proposed research which is centrally
concerned with approaches to our ultimate aim: the development of a
powerful, general, and easily-controlled algorithm for simulation
which is based on a comprehensive perceptual model for natural tones.
We first will mention our intention to explore the possible use of
subtractive synthesis in the simulation of natural music instrument
tones. We anticipate that this method will be useful for the
simulation of percussion instrument tones such as drum and cymbal. In
these cases, one advantage of the subtractive method is that
inharmonic partials or even wide-band noise may be easily introduced
into the sound. When simulating instruments with certain fixed
resonances, one or more filters could be positioned at these
resonances regardless of the fundamental pitch period of the exciting
waveform. Other applications will be explored, and a significant
third approach may result, which would have further advantages
besides those of the above two, additive and frequency modulation
syntheses, and which would be integrated into our approach towards
a general model for simulation.
We will next describe experimental procedures used to investigate the
perceptual processing of instrument tones, since, to assist in the
development of a general simulation algorithm, we must formulate a
general model for the perception of timbre. This will provide
important information for the construction of perceptually-based
higher-order simulation algorithms. We employ a spatial model for
the subjective structure of the perceptual relationships between
signals. In particular, multidimensional scaling techniques will be
discussed. Research is directed at uncovering the dimensionality of
the subjective space, the psychophysical relationships which are
structurally correlated to this space, and the properties of the
space. The existence of such constraints as categorical boundaries
will be investigated in an attempt to assess the continuity of the
subjective space for timbre. Of interest is the possible existence
of a categorical mode of perception for musical sounds, as has been
claimed for speech (Liberman, et. al., 1967). In the same regard,
we will also examine the effects of musical training or context on
the structure of the space. The model will be evaluated by our
ability to predict the mappings of real and novel tones. For the
purpose of investigating the properties of a subjective space for
timbre, and for testing the existence of a categorical mode of
perception, we are designing algorithms which produce new tones whose
physical properties lie between two known music instrument tones. An
example of one algorithm, designed for additive synthesis based upon
the data-reduced analysis of real tones, is shown in Figure 13, and
can be heard in Recorded Example 3.
Based on these findings, we plan the design of an algorithm which maps
the dynamic spectra of real tones into the FM parameters and time
functions. The ability to write such an algorithm would indicate our
success at having identified the perceptual dimensions of timbre.
This is an important step in the convergence of synthesis techniques
toward our ultimate goal stated above.
.END
.SELECT 5
.GROUP SKIP 2
exploration of subtractive synthesis techniques
.SELECT 1
.BEGIN FILL ADJUST
A form of synthesis which we have not yet discussed, which actually
constitutes the only class of sound synthesis uncovered by our
research, is that of subtractive synthesis. The procedure here is to
take a simple signal with a wide bandwidth, such as a pulse train or
a band-limited sawtooth wave, and apply spectral shaping filters to
produce the desired partial tone amplitudes. We have not as yet used
this form of synthesis, but intend to do so in the near future. This
is the type of synthesis most commonly used in vocoders. We may thus
assimilate the techniques of analysis and synthesis of human speech
and apply them in a more general context. Two of the most useful
methods seem to be the linear predictor (Atal & Schroeder, 1970; Atal
& Hanauer, 1971; Markel, 1972) and the homomorphic vocoder (Oppenheim
& Schafer, 1968; Oppenheim, 1969; Miller, 1973).
Generally, the technique would be as follows. A musical
instrument tone would be analysed at discrete intervals, for instance,
every 5 milliseconds. At each analysis point, we compute a filter
whose frequency response approximates the spectral shape of the input
waveform in the interval around the analysis point. To resynthesize
the signal, we filter a pulse train, updating the filter parameters
at each analysis point. In the case of the linear predictor, the
filter is an all-pole filter. For the homomorphic vocoder, the filter
is an all-zero filter. Since these methods are well documented in the
literature, we shall not explain them here.
We anticipate that the linear predictor will be useful for analyzing
percussion instrument tones such as drum and cymbal. In these cases,
the excitation might be modeled as band-limited noise, rather than an
impulse train, the spectral shaping being applied by the filter
produced by the linear prediction algorithm. Although the same thing
could be done with the homomorphic vocoder, a difficult convolution
is then required.
The hope is that by using time-varying filters with both poles and
zeros, a lower-order filter may be used. At present, the only way of
automatically producing the parameters for such a filter directly
from digitized sound is by time-consuming optimization techniques. We
propose to determine the parameters initially in much the same way as
one determines the modulation indices for FM instruments. This
involves studying the results of heterodyne filter analysis, the
manual preparation of parameter trajectories, and the testing of the
results by listening to and analyzing the waveform synthesized by the
prepared parameters. We hope to determine whether economical
subtractive synthesis can be realized for a wide variety of
instruments, and whether an efficient automatic method can be
determined for the calculation of time-varying filter parameters.
One advantage of the subtractive method is that inharmonic partials
or even wide-band noise may be introduced into the sound by adding
such excitation to the driving pulse train. The total excitation will
then be passed through the filter and will experience the same
spectral shaping that a pure pulse train would experience.
When simulating instruments with certain fixed resonances, one or
more filters could be positioned at these resonances regardless of
the fundamental pitch period of the exciting waveform. This means
that the size of the multiple tables of parameters as functions of
the fundamental pitch period might be greatly reduced.
In the event that analysis-based subtractive synthesis proves to be a
tool for the extension of the set of tones which we can simulate, it
of course will be included in the research program. The future
research which follows would in that case incorporate the use of
subtractive synthesis in addition to the additive and FM synthesis
techniques that we have found useful to date.
.END
.NEXT PAGE
.SELECT 5
applications of multidimensional scaling to timbre perception
.SELECT 1
.BEGIN FILL ADJUST
A spatial model will be employed to represent the judged
relationships between music instrument tones for the purpose of
uncovering the perceptual dimensions of timbre. If the reader is not
familiar with the computer-based multidimensional scaling techniques
discussed below, it would be most instructive at this time to read
Appendix B, which introduces the basic concepts of multidimensional
scaling and discusses the specific algorithms we will use.
Multidimensional scaling is initially useful for exploring the
psychophysical relationships involved in the perception of timbre,
that is, the relationships between the subjective, psychological
qualities of tones and their physical properties. Interpretation of
the perceptual configuration in terms of physical attributes or an
actual correlation of the subjective dimensions to physical
dimensions in the signals is most desirable. Various attempts have
been made in the recent past. Plomp (1970) and Pols (1970) have
employed the MDSCAL algorithm to investigate the perception of
steady-state auditory signals generated from single periods of
musical tones and vowels. Although several problems exist with the
specific assumptions of their approach, the most unsatisfactory
aspect is the restrictive definition of timbre which excludes the
temporal qualities of sound. Wessel (1973b) has used MDSCAL and
INDSCAL in a study of perceived and imagined relationships for a set
of 9 music instrument tones. Two-dimensional spatial representations
were interpretable with respect to the spectral distribution and
temporal relations in the onsets of components of the tones as
analyzed by speech spectography. The results of Wessel's preliminary
study were consistent with a pilot experiment which we conducted with
14 music instrument tones last year. Both indicate great potential
for multidimensional scaling and the fruitfulness of its application
to a larger range of timbres.
We feel that there is special potential in the perceptual scaling of
the computer-simulated music instrument tones described above. The
ability to independently control the pitches, loudnesses, and
durations of tones that are synthesized makes it possible to
experimentally account for these dimensions in the stimuli. This is
a non-trivial problem for experiments on timbre perception!
In the case of additive synthesis, tones which are indiscriminable
from the original recordings and which are synthesized from very
reduced data structures give the investigator a powerful advantage in
the interpretation of psychophysical relationships. Not only are the
physical parameters of the signals completely known, but they have
been simplified to the extent that they are more easily dealt with.
Nonessential physical characteristics have been removed
independently, reducing the number of possible physical attributes
which the investigator must consider in an interpretation of the
perceptual space. The preliminary success of filtering experiments
for the localization of perceptual cues for source identification,
mentioned earlier for additive synthesis, suggests the value of an
extended study using a wide set of signals. We plan to study both
confusions in identification and perceived similarity for sets of
instrument tones presented in various conditions, including
filtering.
Likewise, tones generated by frequency modulation synthesis provide a
means of controlling significant physical cues with a small set of
parametric specifications. A rich timbre space results from the
manipulation of a few basic controls which affect the dynamic
evolution, bandwidth, and frequency ratios of the partials of a
synthesized complex tone which has many of the characteristics of
natural tones. Much insight may be gained from the perceptual scaling
of such a simply-specified but multidimensionally rich space. The
extension of FM into wider ranges of the timbre space, including the
non-periodic music instrument tones, provides obvious advantages for
scaling.
Various approaches will be taken to examine specific properties of
the subjective space for timbre. Predictions for mappings of new
tones will be made in order to confirm our interpretation of the
physical correlations to the perceptual space. We are particularly
interested in the perception of tones which the listener might not be
already familiar with, such as real ethnic instruments or syntheses
of tones which have no analog in the real world. The influence of
familiarity with instruments and musical training on the perception
of timbre will be explored.
Another interest is the continuity of the space, which centers on the
nature of the regions which lie in between a mapped set of tones.
Especially important is the consideration of the effects of the
categorical identification of instruments on the perception of
simulated tones which might physically lie in between the known
tones. Equal steps along acoustical dimensions might not map as
equal steps along respective perceptual dimensions, because of the
influence on perception of the tendency to categorize input signals
with respect to their sources of origin. We describe below
independent perceptual testing for the existence of a categorical
mode of perception for timbre. Multidimensional scaling will also be
used to explore the space in between the known points.
.END
.NEXT PAGE
.SELECT 5
investigation of categorical perception
.SELECT 1
.BEGIN FILL ADJUST
As an independent test for the possible existence of cognitive
constraints on the perception of simulated music instrument tones,
we will employ procedures to examine the existence of a categorical
mode of perception. Researchers at Haskins Labs report categorical
effects for certain speech sounds. Equal steps along an acoustical
dimension, interpolated between the modelled physical
characteristics of two or more known speech sounds, are employed as
stimuli. Listeners identify the stimuli as falling into definite
categories having narrow overlap. In addition, the discriminability
for pairs of stimuli, all of which are equally separated along the
physical dimension, is affected by their position with respect to
these categorical boundaries. If both members of a pair fall within
a single category, they are discriminated more poorly than if they
fall within different categories. The Haskins group has used this as
evidence for a very special mode of perception for speech, and the
failure of other researchers to find a similar interaction between
categorical identification and discrimination of stimuli is presented
in support of their theory (see Liberman, et. al, 1967; Liberman,
1972).
We are interested in testing for the existence of categorical effects
in the perception of another set of stimuli which are in the same
sensory domain and obtain comparable complexity as the speech sounds.
Primarily we are concerned with perceptual constraints on a
simulation algorithm, but obvious fallout will occur for general
theories of perception. Our specific procedure will be similar to
the Haskins approach. Identification functions will be compared to
discrimination functions for a set of stimuli consisting of tones
physically interpolated between two known end-points, through a
multidimensional physical space. Discrimination functions will be
derived by means of the `same-different' task discussed above.
We have been able to produce interpolations with both methods of
synthesis. Additive synthesis with data-reduced physical
representations of tones has given us a potential method for
interpolating between known sounds, and the results have been
strikingly successful. The algorithm has been based on the physical
propeties of tones. It consists of interpolations between the
two-dimensional locations (time vs. amplitude or frequency) of joints
in the three line-segment representations of the parallel functions
for two known tones. An example of a set of interpolations between
the violin tone, discussed above, and an alto saxophone tone at the
same pitch and duration is shown in Figure 13. A particularly
significant finding thus far is that the set of tones which are
produced in this manner are identified as being either one or the
other of the two known instruments, and that the categorical
boundaries have been rather sharp. Frequency modulation synthesis
provides another approach for interpolating between sounds. It gives
us a set of parameters which can be systematically altered from one
value to another. Two shapes of control functions, used for
amplitude or index, can be interpolated between in various ways, as
in the manner used for interpolating between functions explained
above for additive synthesis. Successful interpolations have also
been produced with this approach, including interpolations between
periodic and nonperiodic sounds.
We must emphasize that we are only in the early stages of exploring
the interpolation between sounds. We are presently employing
algorithms based on the physical structures of tones, and will map
these into a perceptual space, both using multidimensional scaling
and the identification- discrimination procedure to test for
categorical perception. It is clear to us from our investigation to
date that we have uncovered very useful tools for the examination of
properties of the perceptual space for timbre with respect to a
categorical mode of perception.
.END
.GROUP SKIP 2
.SELECT 5
automatic FM mappings from analyzed tones and the convergence of approaches
.SELECT 1
.BEGIN FILL ADJUST
We see the automatic generation of parametric and functional data for
FM simulations of instrument tones to be a significant step towards
the development of a general model for simulation. The potential
increase in facility and knowledge from higher-order algorithms is
great. The difference in the two synthesis techniques provides us
with a powerful means for evaluating hypotheses concerning the
synthesis of timbres because of the difference of spectral control
between the two: the additive technique allows independent control of
the amplitude of each of the components in time, where the FM
technique allows control of only the bandwidth. Based on the models
for timbre perception which we are able to formulate with the use of
multidimensional scaling, we propose to construct algorithms which
map into FM data, first, the reduced data of the additive synthesis
technique and second, the unreduced data from the original analysis.
The difference, if any, between the analytical parts of the two
mapping algorithms for the additive and FM techniques will be forced
to converge where possible.
Extending the notion of mapping algorithms, and in the light of
interactions which continue to occur between the two approaches to
synthesis, we see the formulation of a most powerful simulation
algorithm which combines the advantages of both approaches. It would
enable the user to realistically simulate any known sound via the
most perceptually relevant parametric specification. The power of
user-control over the FM technique, in that the small number of
parameters and time-functions are of such strong perceptual
importance, provides a model for the optimal level of parametric
specification. The degrees of freedom for the additive synthesis
technique, which allows the synthesis of any arbitrary configuration
of functions for the amplitudes and phases of the components of
tones, provides the most powerful means of simulating any point in
the physical timbre space. The course of future research is towards
the convergence of the most powerful attributes of both methods, and
the eventual formulation of an algorithm for simulation which truly
satisfies the criteria we outlined in the introduction of section
II: 1) the optimal use of computer resources, i.e. storage,
efficiency, 2) the perceptual validity of the results in terms of
naturalness, 3) the general applicability to the widest range of
cases found in the repertoire of instrumental timbres, 4) the level
of user-control of the algorithm such that parametric specifications
are perceptually meaningful, 5) the efficiency with which hypotheses
may be verified.
.END
.NEXT PAGE